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WHICH  EYE  TRACKER  IS  RIGHT  FOR  YOUR  RESEARCH? 
PERFORMANCE  EVALUATION  OF  SEVERAL  COST  VARIANT 

EYE  TRACKERS 


Gregory  Funke1,  Eric  Greenlee2,  Martha  Carter3,  Allen  Dukes1,  Rebecca  Brown4,  Lauren  Menke4 

'Air  Force  Research  Laboratory,  Wright-Patterson  AFB,  OFI;  2Texas  Tech  University,  Lubbock,  TX; 

3Miami  University,  Oxford,  OH;  4Ball  Aerospace  &  Technologies  Corporation,  Fairborn,  OH 

Though  not  often  mentioned,  the  price  point  of  many  eye  tracking  systems  may  be  a  factor  limiting  their 
adoption  in  research.  Recently,  several  inexpensive  eye  trackers  have  appeared  on  the  market,  but  to  date 
little  systematic  research  has  been  conducted  to  validate  these  systems.  The  present  experiment  attempted 
to  address  this  gap  by  evaluating  and  comparing  five  different  eye  trackers,  the  Eye  Tribe  Tracker,  Tobii 
EyeX,  Seeing  Machines  faceLAB,  Smart  Eye  Pro,  and  Smart  Eye  Aurora  for  their  gaze  tracking  accuracy 
and  precision.  Results  suggest  that  all  evaluated  trackers  maintained  acceptable  accuracy  and  precision,  but 
lower  cost  systems  frequently  also  experienced  high  rates  of  data  loss,  suggesting  that  researchers  adopting 
low  cost  systems  such  as  those  evaluated  here  should  be  judicious  in  their  research  usage. 


As  noted  by  McCarley  and  Kramer  (2007),  eye  tracking 
has  been  an  important  source  of  information  about  perception 
and  cognition  for  more  than  50  years.  It  has  been  utilized  to 
study  a  diverse  number  of  topics  such  as  the  patterns  of 
fixations  and  saccades  while  reading  text  (e.g.,  Rayner,  1998), 
the  workload  of  pilots  during  different  phases  of  flight  (e.g., 

Di  Nocera,  Camilli,  &  Terenzi,  2007),  and  the  effectiveness  of 
visual  advertisements  (e.g.,  Wedel  &  Pieters,  2008),  among 
many,  many  others. 

However,  research  including  eye  tracking  has  not  been  as 
commonplace  as  it  could  be.  As  noted  by  Jacob  and  Karn 
(2003),  eye  tracking  has  remained  a  very  promising  tool  for 
research,  but  it  has  never  been  as  widely  utilized  as  it 
potentially  could  be.  Those  authors  provide  a  cogent  treatment 
of  the  factors  potentially  inhibiting  wider  adoption  of  eye 
tracking  methodologies,  including  limitations  and  challenges 
associated  with  eye  tracking  hardware  and  software,  and  with 
the  resultant  data  related  to  volume,  extraction,  and 
interpretation. 

An  additional  consideration  not  specifically  mentioned  by 
Jacob  and  Karn  (2003)  is  the  cost  of  an  eye  tracker.  System 
prices  typically  scale  with  hardware  capabilities  and  included 
software,  and  may  range  from  thousands  to  hundreds  of 
thousands  of  dollars,  potentially  putting  eye  trackers  beyond 
the  means  of  many  laboratories.  However,  a  few  very 
inexpensive  (i.e.,  less  than  $1,000  US)  eye  trackers  have 
begun  to  appear  on  the  market,  such  as  the  Eye  Tribe  Tracker 
( theeyetribe.com/),  the  Tobii  EyeX 

(www.tobii.com/xperience/products/),  and  the  GazePoint  GP3 
( www.gazept.com/product/gazepoint-gp3-eye  tracker/).  These 
systems  feature  relatively  “no  frills”  hardware  and  little  to  no 
included  software. 

While  these  systems  offer  interested  researchers  a  low 
cost  option  for  inclusion  of  eye  tracking  in  their  research,  few 
evaluations  of  the  technical  capabilities  of  such  systems  have 
been  conducted  to  date  (though  see  Dalmaijer,  2014; 
Janthanasub  &  Meesad,  in  press;  and  Ooms,  Dupont,  Lapon, 

&  Popelka,  2015,  for  limited  evaluations  of  the  Eye  Tribe 
Tracker).  The  purpose  for  the  current  evaluation  study  was  to 
address  this  gap  by  examining  the  capabilities  of  two  low  cost 


trackers,  the  Eye  Tribe  Tracker  and  the  Tobii  EyeX,  compared 
to  two  “established”  trackers,  Seeing  Machines  faceLAB  and 
Smart  Eye  Pro,  and  a  new  product,  Smart  Eye  Aurora. 

METHODS 

Participants 

In  this  experiment,  16  people  (TO  men,  6  women)  were 
recruited  from  local  universities,  available  Air  Force 
personnel,  and  the  local  community.  They  ranged  in  age  from 
20  to  55  (M  =  29.75,  SD  =  9.71).  Prospective  participant 
observers  were  required  to  have  normal  or  corrected  to  normal 
visual  acuity.  To  assess  the  sensitivity  of  the  examined  eye 
trackers  to  the  presence  of  eye  glasses,  6  of  the  observers  wore 
their  glasses  throughout  the  experiment. 

Apparatus 

Eye  trackers.  Five  eye  trackers  were  chosen  for  inclusion 
in  this  evaluation  study:  the  Eye  Tribe  Tracker,  Tobii  EyeX, 
Seeing  Machines  faceLAB,  Smart  Eye  Pro,  and  Smart  Eye 
Aurora.  These  five  trackers  were  selected  because  of  their 
accessibility  to  our  laboratory  and  because  they  represent  a 
diverse  set  of  relative  price  points,  from  low  (Eye  Tribe 
Tracker,  Tobii  EyeX),  to  medium  (Smart  Eye  Aurora),  and 
high  (Seeing  Machines  faceLAB,  Smart  Eye  Pro). 

All  five  eye  trackers  are  off-head,  optical  tracking 
systems  (see  Figure  1  for  images  of  the  eye  tracker  layouts). 

All  trackers  feature  at  least  two  video  cameras  (Smart  Eye  Pro 
was  employed  using  a  four  camera  set  up)  and  at  least  one 
infrared  emitter.  The  evaluated  trackers  operate  on  the  same 
basic  principles,  i.e.,  infrared  light  reflected  from  the  eye 
(corneal  reflection),  eye  features  such  as  the  pupil,  and  facial 
features  such  as  the  canthus  are  used  to  extract  information 
about  point  of  gaze.  More  precisely,  each  system  reports  the 
on-screen  coordinates  that  correspond  to  the  estimated  point  of 
intersection  between  the  observer’s  gaze  and  the  visual  display 
(readers  interested  in  a  more  comprehensive  understanding  of 
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the  operation  of  eye  trackers  are  directed  to,  e.g.,  Holmqvist  et 
al.,  2011). 

Each  tracker  has  a  specific  recording  speed,  specified  in 
hertz  (Hz),  which  represents  the  number  of  gaze  estimates  the 
system  makes  per  second.  Nearly  all  of  the  trackers  included 
in  this  evaluation  record  at  60  Hz,  with  the  exception  of  the 
Smart  Eye  Pro,  which  records  at  120  Hz.  It  is  worth  noting 
that  the  observed  recording  rates  of  all  systems  included  in 
this  evaluation  were  within  approximately  1  Hz  of  the 
manufacturer  specified  sampling  rates  (i.e.,  each  of  the 
evaluated  systems’  recording  rates  were  very  close  to  those 
advertised  by  manufacturers). 

Task  environment.  Due  to  space  constraints  associated 
with  deployment  of  the  eye  tracking  systems,  trackers 
evaluated  in  this  experiment  were  split  between  two  identical 
workstations  (see  Figure  1  for  the  layout  of  those  systems). 

At  both  workstations,  task  stimuli  were  presented  to  observers 
on  48.26  cm  Samsung  SyncMaster  940Bx  LCD  monitors.  The 
monitors  were  set  to  a  1280  x  1024  display  resolution  (display 
PPCM  =  34).  Note  that  each  eye  tracker  was  placed  according 
to  manufacturer  recommendations,  and  infrared  emitters  were 
disabled  when  the  associated  system  was  not  in  use. 


Figure  1.  Illustration  of  the  five  eye  tracker  systems  deployed  at  the  two  work 
stations.  The  five  trackers  are:  1)  Smart  Eye  Aurora,  2)  Smart  Eye  Pro,  3)  Tobii 
EyeX,  4)  Eye  Tribe  Tracker,  and  5)  Seeing  Machines  faceLAB. 

This  evaluation  experiment  required  seven  networked 
PCs.  All  PCs  featured  x86  compatible  processors  and  the 
Windows  7  operating  system.  Each  evaluated  eye  tracker  was 
connected  to  a  separate  PC  that  ran  all  associated  software  and 
recorded  gaze  data.  An  additional  PC  presented  and  recorded 
fixation  task  events  (described  below).  The  final  PC  was 
utilized  by  the  experimenter  to  control  the  task  computer  and 
initialize  the  appropriate  eye  tracker  before  the  fixation  task 
began.  The  experimenter’s  PC  also  ran  custom  software  that 
synchronized  system  time  across  all  PCs,  enabling  comparable 
time  stamping  across  computers. 

Procedure 

Upon  arrival  in  the  laboratory  prospective  participant 
observers  were  required  to  verbally  verify  that  they  had 
normal  or  corrected  to  normal  visual  acuity,  and  that  they  were 
wearing  their  corrective  lenses  if  required  to  do  so.  Observers 
were  then  assigned  a  random  schedule  of  exposure  to  each  eye 
tracker  system  under  evaluation. 

Next,  observers  were  seated  at  the  appropriate 
workstation.  The  seated  distance  of  observers  to  the  monitor 
varied  slightly  based  on  the  height  of  the  observer  and  the 


specific  eye  tracker  being  evaluated.  Once  an  observer  was 
seated  at  an  appropriate  distance  and  height  for  the  tracker,  the 
distance  between  the  observer’s  eye  and  the  display  monitor 
was  recorded.  These  individualized  values  were  then  utilized 
in  calculating  all  associated  visual  angles  for  each 
combination  of  observer  and  eye  tracker.  Across  observers  and 
trackers,  the  mean  seated  viewing  distance  was  70.08  cm. 

After  being  seated,  the  next  step  was  to  calibrate  the  eye 
tracker  system.  Up  to  four  calibration  attempts  were  made  for 
each  observer  on  each  eye  tracker  system.  Calibration  was 
considered  successful  if  tracking  could  be  achieved  at  an 
average  2°  visual  angle  error  or  less  across  the  display  screen, 
as  reported  by  the  eye  tracker’s  calibration  software.  If 
calibration  could  not  be  achieved  at  that  level  within  four 
attempts  that  observer  was  marked  as  non-trackable  for  that 
system  and  calibration  was  initiated  for  the  next  system 
according  to  the  observer’s  assigned  schedule. 

The  calibration  procedure  for  each  tracker  was  mostly 
similar.  Calibration  requires  observers  to  gaze  at  a  succession 
of  on-screen  points,  and  based  on  these  data,  the  eye  tracker 
software  attempts  to  accurately  assess  the  point  of  gaze-screen 
intersection.  Tobii  EyeX,  Seeing  Machines  faceLAB,  Smart 
Eye  Pro,  and  Smart  Eye  Aurora  all  employ  a  9-point  (3  x  3) 
calibration  grid;  the  Eye  Tribe  Tracker  utilizes  a  16-point  (4  x 
4)  grid.  Following  calibration,  each  system  provides  an 
estimate  of  tracking  accuracy ,  i.e.,  the  degree  of  error  in 
assessed  gaze  location  (usually  specified  in  degrees  visual 
angle).  In  addition,  the  Seeing  Machines  faceLAB  and  the 
Smart  Eye  Pro  and  Aurora  trackers  also  provide  a  measure  of 
the  standard  deviation  associated  with  calibration  accuracy, 
which  is  typically  referred  to  as  the  system’s  precision. 
However,  the  specificity  of  the  accuracy  metric  varied  across 
trackers  included  in  this  evaluation. 

Specifically,  the  Tobii  EyeX  provides  only  a  binary 
calibration  outcome,  i.e.,  “calibrated”  or  “not  calibrated.”  To 
ensure  that  participants  met  the  calibration  inclusion  criteria 
described  previously,  a  follow-up  “calibration  evaluation 
window”  was  required.  In  this  software  (which  was  included 
with  the  EyeX),  the  9-point  calibration  grid  is  re-presented 
with  additional,  larger  circles  (roughly  4.25  cm,  or  2°  visual 
angle)  around  each  point.  Overlaid  on  this  display  are  real¬ 
time  gaze  location  estimates  made  by  the  eye  tracker. 
Observers  in  this  evaluation  were  required  to  serially  gaze  at 
each  of  the  9  calibration  points,  and  if  a  preponderance  of  the 
real-time  estimated  gaze  locations  fell  outside  the  2°  visual 
angle  border,  recalibration  was  initiated. 

A  bit  more  sophisticatedly,  the  Eye  Tribe  Tracker 
provides  a  categorical  rating,  from  1  to  5,  of  calibrated 
accuracy.  The  ratings,  (derived  from  the  manufacturer’s 
website,  http://dev.theevetribe.com/start/).  are:  1  -  recalibrate; 
2  -  poor  (<  1.5°  visual  angle  error);  3  -  moderate  (<  1°  visual 
angle  error);  4  -  good  (<  .7°  visual  angle  error);  5  -  perfect  (< 
.5°  visual  angle  error). 

The  remaining  trackers  evaluated  provide  a  numerical 
estimate  of  tracking  accuracy  and  precision.  The  Seeing 
Machines  faceLAB  tracker  outputs  a  display-wide  average, 
while  the  Smart  Eye  Pro  and  Aurora  each  provide  separate 
estimates  for  each  calibration  point. 
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Following  successful  calibration  of  a  tracker,  observers 
then  engaged  the  fixation  task.  During  this  task,  fixation 
crosses,  which  appeared  as  60  point  (-1.73°  visual  angle) 
Futura  Bold  “plus”  (“+”)  signs,  were  displayed  on  the 
workstation  monitor.  Crosses  were  presented  in  black  (RGB: 

0,  0,  0;  luminance  =  .18  cd/m2)  on  a  gray  (RGB:  240,  240, 

240;  luminance  =  93.53  cd/m2)  background.  The  contrast  of 
the  black  cross  against  the  gray  background  based  on  the 
Michaelson  contrast  ratio  (maximum  luminance  -  minimum 
luminance  /  maximum  luminance  +  minimum  luminance; 
Coren,  Ward,  &  Enns,  1999)  was  99.60%. 

Crosses  were  presented  serially  (only  one  on  the  screen  at 
a  time)  for  3  seconds.  The  crosses  were  programmed  to  appear 
in  a  random  order  at  36  locations  on  the  screen  -  these 
locations  were  determined  by  dividing  the  screen  into  a  6  x  6 
grid  (see  Figure  2  for  an  illustration).  The  crosses  appeared  at 
the  center  of  each  grid  rectangle.  Observers  were  instructed  to 
fixate  the  center  of  each  cross  as  it  was  presented  on  the 
screen  during  the  fixation  task.  After  the  3  second  presentation 
duration  of  a  cross  had  elapsed,  a  new  cross  was  generated  at 
another  random  location  in  the  grid,  with  the  stipulation  that  a 
cross  was  displayed  at  each  grid  location  five  times  during  the 
fixation  task.  The  fixation  task  was  9  minutes  in  duration, 
during  which  observers  viewed  a  total  of  180  fixation  crosses. 

0 - >  1280 


1024 

Figure  2.  Illustration  of  the  spatial  arrangement  of  the  6x6  grid  of  fixation 
cross  locations  during  the  fixation  task.  Also  depicted  are  the  three  screen 
"zones,"  center,  intermediate,  and  outer  edge,  which  are  presented  here 
bounded  by  dashed  lines  and  in  red,  blue,  and  green,  respectively,  to 
facilitate  comprehension;  the  actual  fixation  task  display  featured  no  such 
screen  demarcations. 

Observers  were  free  to  complete  their  assigned  order  of 
eye  trackers  at  their  convenience  (i.e.,  observers  were  not 
required  to  complete  the  evaluation  in  a  single  session).  Most 
observers  completed  the  evaluation  across  2-3  sessions. 

RESULTS 

In  presenting  the  results  of  our  evaluation,  we  will  begin 
with  issues  of  calibration,  and  then  proceed  to  data  quality, 
and  finally  estimates  of  tracking  accuracy  and  precision. 

Calibration  Outcomes 

The  number  of  attempts  required  to  meet  our  satisfactory 
calibration  criteria  (i.e.,  calibration  resulting  in  2°  visual  angle 
error  or  less,  as  reported  by  the  eye  tracker’s  calibration 
software)  varied  between  observers  and  eye  tracking  systems. 


In  some  cases,  successful  calibration  was  not  achieved, 
resulting  in  a  reduced  sample  size  for  each  of  the  eye  tracking 
systems.  Table  1  presents  the  percentage  of  observers  who 
could  be  calibrated  on  each  eye  tracker  (“Percent  calibrated” 
in  the  table).  To  facilitate  comprehension,  data  in  the  table  are 
presented  for  observers  without  glasses  (“No  glasses”), 
observers  who  wore  their  glasses,  and  for  the  total  sample 
(“All  observers”).  Also  presented  in  the  table  are  the  mean 
numbers  of  attempts  required  to  achieve  successful  calibration 
for  each  system. 

Perusal  of  Table  1  will  reveal  that  for  each  eye  tracker 
evaluated,  the  percentage  of  successful  calibrations  is 
relatively  poor  for  observers  with  glasses,  resulting  in  a 
drastically  reduced  sample  size  to  evaluate  tracking  quality 
with  glasses.  This  is  not  unexpected,  however,  as  the  presence 
of  eye  glasses  may  interfere  with  or  distort  eye  tracking 
attempts,  due  to  factors  such  as  lens  thickness  and  increased 
glare  (Poole  &  Ball,  2006). 

Gaze  Tracking  Quality 

As  mentioned  above,  each  of  the  eye  tracking  systems 
included  in  the  current  evaluation  provided  an  estimate  of 
observer  gaze  location  multiple  times  per  second.  In  addition 
to  gaze  location,  those  estimates  also  included  an  indicator  of 
data  quality.  Such  metrics  are  provided  by  each  system  and  are 
essentially  qualitative  confidence  measures  based  on  whether 
and  to  what  degree  the  system  was  “locked  on”  to  critical 
gaze-related  features  of  an  observer  necessary  for  the  system 
to  make  an  accurate  estimate  of  gaze  location. 

The  nature  and  organization  of  these  quality  measures 
vary  from  system  to  system.  For  most  systems,  information  is 
provided  indicating  whether  or  not  the  system  was  able  to 
track  a  user’s  gaze,  and  if  so,  whether  this  tracking  is  based  on 
features  from  both  eyes,  a  single  eye,  or  the  position  and 
orientation  of  the  head  alone.  For  systems  that  provided  this 
information,  we  discarded  data  points  for  which  the  system 
was  unable  to  track  the  observer’s  gaze  based  on  both  eyes  - 
per  the  recommendation  of  most  of  the  eye  tracking  system 
manufacturers.  The  Smart  Eye  Aurora  system  did  not  provide 
an  absolute  quality  value  indicating  whether  both  eyes  could 
be  tracked,  so  we  discarded  data  points  during  which  head 
tracking  was  lost.  Table  1  presents  the  percentage  of  usable 
data  for  each  system  remaining  after  data  points  of  insufficient 
quality  were  discarded  (“Percent  usable  data”  in  the  table). 

Examination  of  Table  1  suggests  that  the  low  cost  Eye 
Tribe  Tracker  and  the  Tobii  EyeX  experienced  more  frequent 
data  quality  problems  than  the  other,  more  costly  trackers 
evaluated,  resulting  in  substantially  fewer  usable  gaze  estimate 
data  points. 

Gaze  Tracking  Accuracy  &  Precision 

Calibrated  accuracy  and  precision.  As  mentioned 
previously,  following  calibration,  each  of  the  evaluated  eye 
tracking  systems  provided  a  measure  of  calibration  accuracy. 
In  addition,  the  Seeing  Machines  faceLAB  and  the  Smart  Eye 
Pro  and  Aurora  trackers  also  provided  measures  of  precision. 
These  values  represent  the  manufacturer’s  best  estimate  of  the 


DISTRIBUTION  STATEMENT  A.  Approved  for  public  release. 


Cleared,  88PA,  Case  #2016-1334. 


Proceedings  of  the  Human  Factors  and  Ergonomics  Society  2016  Annual  Meeting 


1242 


Table  1.  Performance  of  each  evaluated  eye  tracking  system. 


Tracking  System 
Seeing 


Evaluated  Factors 

Eye  Tribe 
Tracker 

Tobii 

EyeX 

Machines 

faceLAB 

Smart 
Eye  Pro 

Smart  Eye 
Aurora 

No  glasses 

Percent  calibrated 

90.00% 

90.00% 

80.00% 

100.00% 

70.00% 

Mean  cal.  attempts 

1.75 

1.00 

1.29 

1.60 

1.86 

Percent  usable  data 

77.99% 

76.02% 

88.17% 

100.00% 

99.86% 

Cal.  angular  error 

<  0.70° 1 

<  2.00° 1 

1.07° 

0.96° 

0.48° 

OAE:  Whole  screen 

1.30° 

1.05° 

2.40° 

1.93° 

1.70° 

OAE:  Center 

1.22° 

0.93° 

2.65° 

1.37° 

1.27° 

OAE:  Intermediate 

1.29° 

0.96° 

2.49° 

1.74° 

1.49° 

OAE:  Outer  edge 

1.32° 

1.14° 

2.30° 

2.15° 

1.91° 

Cal.  precision 

N/A2 

N/A2 

0.84° 

1.15° 

0.34° 

OP:  Whole  screen 

1.04° 

0.67° 

1.41° 

1.21° 

1.20° 

OP:  Center 

0.61° 

0.58' 

1.29° 

0.89° 

0.99° 

OP:  Intermediate 

0.87° 

0.59° 

1.34° 

1.04° 

1.07° 

OP:  Outer  edge 

1.23° 

0.73° 

1.47° 

1.38° 

1.32° 

Glasses 

Percent  calibrated 

0.00% 

50.00% 

33.33% 

66.66% 

16.67% 

Mean  cal.  attempts 

N/A3 

1.67 

1.50 

1.33 

3.00 

Percent  usable  data 

N/A3 

84.32% 

97.00% 

100.00% 

99.95% 

Cal.  angular  error 

N/A3 

<  2.00° 1 

1.10° 

0.83' 

0.58° 

OAE:  Whole  screen 

N/A3 

1.44° 

1.25° 

1.44' 

1.50° 

OAE:  Center 

N/A3 

1.15° 

1.22° 

1.09' 

0.95° 

OAE:  Intermediate 

N/A3 

1.30' 

1.23° 

1.38° 

1.48° 

OAE:  Outer  edge 

N/A3 

1.58° 

1.27° 

1.54' 

1.63° 

Cal.  precision 

N/A2 

N/A2 

0.73° 

1.37' 

0.43° 

OP:  Whole  screen 

N/A3 

0.85° 

0.66° 

1.17' 

1.16° 

OP:  Center 

N/A3 

0.62' 

0.72° 

1.09° 

0.52° 

OP:  Intermediate 

N/A3 

0.89' 

0.73° 

1.26' 

1.20° 

OP:  Outer  edge 

N/A3 

0.88° 

0.61° 

1.14' 

1.26° 

All  observers 

Percent  calibrated 

56.25% 

75.00% 

62.50% 

87.50% 

50.00% 

Mean  cal.  attempts 

1.75 

1.18 

1.33 

1.52 

2.00 

Percent  usable  data 

77.99% 

78.29% 

89.94% 

100.00% 

99.87% 

Cal.  angular  error 

<0.70°' 

<  2.00° 1 

1.07° 

0.92' 

0.49° 

OAE:  Whole  screen 

1.30° 

1.16° 

2.17° 

1.79° 

1.68° 

OAE:  Center 

1.22° 

0.99° 

2.36° 

1.29° 

1.23° 

OAE:  Intermediate 

1.29° 

1.05' 

2.24° 

1.64° 

1.49° 

OAE:  Outer  edge 

1.32° 

1.26° 

2.09° 

1.98° 

1.87° 

Cal.  precision 

N/A2 

N/A2 

0.82° 

1.21° 

0.35° 

OP:  Whole  screen 

1.04° 

0.72° 

1.26° 

1.20° 

1.20° 

OP:  Center 

0.61° 

0.59° 

1.17° 

0.95° 

0.93° 

OP:  Intermediate 

0.87° 

0.67° 

1.22° 

1.10° 

1.09° 

OP:  Outer  edge 

1.23° 

0.77' 

1.30° 

1.31° 

1.32° 

Note.  OAE  =  observed  angular  error;  OP  =  observed  precision. 

'The  Eye  Tribe  Tracker  and  Tobii  EyeX  provide  a  categorical,  rather  than 
quantitative  measure  of  calibration  accuracy.  Please  see  the  Procedure  section 
for  further  information. 

2The  Eye  Tribe  Tracker  and  Tobii  EyeX  do  not  provide  estimates  of  precision. 
3As  no  participants  with  glasses  could  be  calibrated  in  this  evaluation,  these 
data  cannot  be  presented. 

margin  of  error  in  their  gaze  location  discriminations.  To 
facilitate  comparisons  with  observed  accuracy  and  observed 
precision  measures.  Table  1  presents  the  mean,  display-wide 


system-reported,  “calibrated”  error  and  precision  estimates 
(for  those  systems  that  output  such  values). 

Observed  accuracy  and  precision.  After  removing  data 
points  of  insufficient  quality,  the  gaze-tracking  record  was 
overlaid  with  the  corresponding  stimulus  timing  record  from 
the  fixation  task.  The  pairing  of  these  records  produced  an 
event-related  log  of  gaze  tracking,  which  was  then  used  to 
evaluate  tracker  accuracy  and  precision. 

While  the  fixation  task  involved  immediate  shifts  of  the 
fixation  cross,  gaze  redirection  takes  time  (approximately  100 
ms;  Andreassi,  2007),  and  inclusion  of  such  saccadic  behavior 
would  contaminate  any  estimate  of  gaze  fixation.  To  control 
for  this,  the  analysis  of  gaze  tracking  was  limited  to  the  central 
second  of  each  3-second  stimulus  presentation.  For  each 
sample  in  this  1 -second  window,  we  determined  the  Euclidean 
distance  between  the  assessed  gaze  location  and  the  center  of 
the  presented  fixation  cross.  We  then  computed  the  mean 
difference  for  each  observer  and  tracker  at  each  of  the  36 
fixation  cross  locations  as  an  index  of  observed  accuracy.  The 
standard  deviation  (i.e.,  precision)  of  these  values  was  also 
computed  for  each  stimulus.  Both  accuracy  and  precision 
values  were  then  converted  to  degrees  visual  angle,  calculated 
using  each  individual’s  measured  viewing  distance. 

As  a  final  consideration,  the  accuracy  and  precision  of  eye 
tracking  systems  typically  degrades  near  the  edges  of  the 
screen.  To  further  elucidate  eye  tracker  accuracy  in  our 
evaluation,  we  examined  how  distance  from  the  center  of  the 
screen  affected  eye  tracking  performance.  We  accomplished 
this  by  dividing  the  screen  into  three  concentric  “zones,” 
corresponding  to  the  center,  the  intermediate,  and  the  outer 
edge  of  the  screen.  Figure  2  depicts  these  regions  as  they 
relate  to  the  fixation  task  display. 

In  assessing  observed  tracker  accuracy  and  precision,  it 
should  be  noted  that  the  data  collected  from  four  observers 
was  excluded  from  those  calculations.  Specifically,  three 
observers  were  judged  to  have  outlier  accuracy  scores  (i.e., 
greater  than  2.5  SD  from  the  mean),  but  only  for  a  single  eye 
tracker  each;  the  associated  systems  were  the  Eye  Tribe 
Tracker,  Seeing  Machines  faceLAB,  and  the  Smart  Eye  Pro.  A 
final  observer’s  Tobii  EyeX  data  had  to  be  excluded  because 
of  a  software  error  that  prevented  common  timestamping 
across  data  sources.  In  all  cases,  data  were  only  excluded  from 
estimates  of  observed  accuracy  and  precision,  and  only  for  the 
specific  systems  affected. 

Table  1  provides  the  mean  observed  accuracy  and 
precision  for  each  eye  tracker  for  the  whole  screen  and  in  each 
of  the  three  screen  zones.  Generally  speaking,  error  increased 
and  precision  decreased  the  further  the  fixation  cross  was 
presented  from  the  center  of  the  screen.  In  addition, 
comparisons  of  calibrated  and  observed  accuracy  and 
precision  reveal  that,  generally,  the  evaluated  eye  tracking 
systems  were  more  imprecise  than  calibrated  estimates 
suggested. 

DISCUSSION 

Ultimately,  it  is  our  hope  that  this  evaluation  study  will 
serve  other  scientists  as  they  consider  their  choices  regarding 
the  acquisition  and  use  of  eye  tracking  systems  for  research 
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and  psychophysiological  monitoring  applications.  A  key 
consideration  for  such  decisions  may  be  the  ability  of  each 
system  to  be  successfully  calibrated  with  different  individuals. 
While  the  SmartEye  Pro  was  the  most  expensive  of  the  tested 
systems,  it  also  appears  to  have  had  the  highest  rate  of 
successful  calibrations.  In  experimental  settings,  failures  to 
calibrate  represent  an  exclusionary  criterion,  a  screen  for  study 
participation  that  reduces  the  proportion  of  the  population  that 
may  be  sampled  for  study.  In  applied  settings,  failures  to 
calibrate  may  be  even  more  problematic,  as  they  may  prohibit 
psychophysiological  monitoring  and  any  human-in-the-loop 
systems  that  require  such  monitoring. 

Another  consideration  in  the  selection  and  use  of  eye 
tracking  systems  may  be  tradeoffs  between  price  and  tracking 
performance,  which  are  made  apparent  by  the  current  study. 
While  the  degree  of  error  and  precision  of  the  most  affordable 
systems  are  comparable  to  the  most  expensive  systems,  the 
less  expensive  systems  also  produced  a  greater  proportion  of 
low  quality,  unusable  data.  Though  not  directly  considered  in 
this  evaluation,  such  missing  data  can  artifactually  influence 
estimates  of  the  number  and  duration  of  fixations,  saccadic 
rates,  and  blinks,  all  of  which  are  measures  that  may  be  of 
interest  to  many  human  factors  professionals  (Holmqvist, 
Nystrom,  &  Mulvey,  2012).  While  this  may  limit  the  overall 
utility  of  such  trackers,  there  are  likely  applications  for  which 
the  missing  data  would  be  less  problematic.  For  example,  for  a 
researcher  interested  in  relative  dwell  time  across  several  areas 
of  interest  in  a  visual  display,  the  low  cost  eye  trackers  might 
be  sufficient  despite  the  likely  high  degree  of  unusable  data. 

It  is  also  worth  noting  that  each  of  the  eye  trackers  differs 
in  terms  of  the  capabilities  of  the  included  software.  Generally 
speaking,  more  expensive  systems  are  packaged  with  software 
that  can  be  used  to  filter  and  process  data,  while  the  less 
expensive  systems  depend  on  the  user  to  generate  and  apply 
their  own  algorithms.  Additionally,  some  information  cannot 
be  obtained  through  the  use  of  the  most  inexpensive  eye 
trackers.  For  example,  tracking  and  detection  of  eyelid 
behavior  (e.g.  blinks,  PERCFOS)  was  not  a  feature  of  the 
software  provided  with  the  Eye  Tribe  Tracker  and  the  Tobii 
EyeX. 

When  selecting  an  eye  tracker  for  either  research  or 
application  purposes,  we  advise  careful  consideration  of  the 
relative  strengths  and  weaknesses  of  the  systems.  The  eye 
trackers  examined  here  represent  only  a  sample  of  the 
available  choices.  It  is  crucial  that  researchers  continue  to 
examine  the  capabilities  and  accuracy  of  eye  tracking  systems 
in  order  to  determine  the  validity  of  the  measures  that  each 
system  provides.  Future  research  should  extend  to  other  eye 
tracking  systems,  non-gaze  behaviors  (e.g.  eyelid  behavior, 
pupillometry),  and  should  examine  the  effects  of  calibration 
drift  (i.e.,  decreases  in  tracking  accuracy  over  time)  and  head 
movements  on  system  performance. 
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